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Forming High Density Arrays. 

Methods of forming high density arrays of oligonucleotides with a minimal number 
of synthetic steps are known. The oligonucleotide analogue array can be synthesized on a 
5 solid substrate by a variety of methods, including, but not limited to, light-directed chemical 
coupling, and mechanically directed coupling (see Pirrung et al, (1992) U.S. Patent No. 
5,143, 854; Fodor et al., (1998) U.S. Patent No. 5,800,992; Chee et al, (1998) 5,837,832 
In brief, the light-directed combinatorial synthesis of oligonucleotide arrays on a 
glass surface proceeds using automated phosphoramidite chemistry and chip masking 

10 techniques. In one specific implementation, a glass surface is derivatized with a silane 
reagent containing a functional group, e.g. , a hydroxy! or amine group blocked by a 
photolabile protecting group. Photolysis through a photolithogaphic mask is used 
selectively to expose functional groups which are then ready to react with incoming 5' 
photoprotected nucleoside phosphoramidites. The phosphoramidites react only with those 

15 sites which are illuminated (and thus exposed by removal of the photolabile blocking 

group). Thus, the phosphoramidites only add to those areas selectively exposed from the 
preceding step. These steps are repeated imtil the desired array of sequences have been 
synthesized on the soUd surface. Combinatorial synthesis of different oligonucleotide 
analogues at different locations on the array is determined by the pattern of illumination 

20 during synthesis and the order of addition of coupling reagents. 

In addition to the foregoing, additional methods which can be used to generate an 
array of oligonucleotides on a single substrate are described in Fodor et aL, (1993). WO 
93/09668. High density nucleic acid arrays can also be fabricated by depositing premade or 
natural nucleic acids in predetermined positions. Synthesized or natural nucleic acids are 

25 deposited on specific locations of a substrate by light directed targeting and oligonucleotide 
directed targeting. Another embodiment uses a dispenser that moves from region to region 
to deposit nucleic acids in specific spots. 

Hybridization 

30 Nucleic acid hybridization simply involves contacting a probe and target nucleic acid 

under conditions where the probe and its complementary target can form stable hybrid 
duplexes through complementary base pairing (see Lockhart et al., (1999) WO 99/32660). 
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The nucleic acids that do not form hybrid duplexes are then washed away leaving the 
hybridized nucleic acids to be detected, typically through detection of an attached detectable 
label. It is generally recognized that nucleic acids are denatured by increasing the 
temperature or decreasing the salt concentration of the buffer containing the nucleic acids. 
5 Under low stringency conditions (e.g., low temperature and/or high salt) hybrid 

duplexes (e.g., DNA-DNA, RNA-RNA or RNA-DNA) will form even where the annealed 
sequences are not perfectly complementary. 

Thus specificity of hybridization is reduced at lower stringency. Conversely, at 
higher stringency (e.g., higher temperature or lower salt) successful hybridization requires 

10 fewer mismatches. One of skill in the art will appreciate that hybridization conditions may 
be selected to provide any degree of stringency. In a preferred embodiment, hybridization is 
performed at low stringency, in this case in 6x SSPE-T at 37°C (0.005% Triton x-100) to 
ensure hybridization and then subsequent washes are performed at higher stringency (e.g., 
Ix SSPE-T at 37°C) to eliminate mismatched hybrid duplexes. Successive washes may be 

15 performed at increasingly higher stringency (e.g., down to as low as 0.25x SSPET at 37°C 
to 50°C) until a desired level of hybridization specificity is obtained. Stringency can also 
be increased by addition of agents such as formamide. Hybridization specificity may be 
evaluated by comparison of hybridization to the test probes with hybridization to the various 
controls that can be present (e.g., expression level control, normalization control, mismatch 

20 controls, etc.). 

In general, there is a tradeoff between hybridization specificity (stringency) and 
signal intensity. Thus, in a preferred embodiment, the wash is performed at the highest 
stringency that produces consistent results and that provides a signal intensity greater than 
approximately 10% of the backgroimd intensity. Thus, in a preferred embodiment, the 

25 hybridized array may be washed at successively higher stringency solutions and read 

between each wash. Analysis of the data sets thus produced will reveal a wash stringency 
above which the hybridization pattern is not appreciably altered and which provides 
adequate signal for the particular oligonucleotide probes of interest. 

3 0 Signal Detection 
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The hybridized nucleic acids are typically detected by detecting one or more labels 
attached to the sample nucleic acids. The labels may be incorporated by any of a number of 
means well known to those of skill in the art (see Lockhart et al, (1999) WO 99/32660). 

5 Databases 

The present invention includes relational databases containing sequence information, 
for instance for the genes of Tables 3-9, as well as gene expression information in various 
liver tissue samples. Databases may also contain information associated with a given 
sequence or tissue sample such as descriptive information about the gene associated with 

10 the sequence information, or descriptive information conceming the clinical status of the 
tissue sample, or the patient from which the sample was derived. The database may be 
designed to include different parts, for instance a sequences database and a gene expression 
database. Methods for the configuration and construction of such databases are widely 
available, for instance, see Akerblom et al, (1999) U.S. Patent 5,953,727, which is herein 

1 5 incorporated by reference in its entirety. 

The databases of the invention may be linked to an outside or external database. In a 
preferred embodiment, as described in Tables 3-9, the extemal database is GenBank and the 
associated databases maintained by the National Center for Biotechnology Information 
(NCBI). 

20 Any appropriate computer platform may be used to perform the necessary 

comparisons between sequence information, gene expression information and any other 
information in the database or provided as an input. For example, a large number of 
computer workstations are available from a variety of manufacturers, such has those 
available from Sihcon Graphics. Client-server environments, database servers and networks 

25 are also widely available and appropriate platforms for the databases of the invention. 

The databases of the invention may be used to produce, among other things, 
electronic Northerns to allow the user to determine the cell type or tissue in which a given 
gene is expressed and to allow determination of the abundance or expression level of a 
given gene in a particular tissue or cell. 

30 The databases of the invention may also be used to present information identifying 

the expression level in a tissue or cell of a set of genes comprising at least one gene in 
Tables 3-9 comprising the step of comparing the expression level of at least one gene in 
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Tables 3-9 in the tissue to the level of expression of the gene in the database. Such methods 
may be used to predict the physiological state of a given tissue by comparing the level of 
expression of a gene or genes in Tables 3-9 from a sample to the expression levels found in 
tissue from normal liver, mahgnant liver or hepatocellular carcinoma. Such methods may 
5 also be used in the drug or agent screening assays as described below. 

Kits 

The invention fiarther includes kits combining, in different combinations, high- 
density oligonucleotide arrays, reagents for use with the arrays, signal detection and array- 
10 processing instruments, gene expression databases and analysis and database management 
software described above. The kits may be used, for example, to predict or model the toxic 
response of a test compound, to monitor the progression of liver disease states, to identify 
genes that show promise as new drug targets and to screen known and newly designed drugs 
as discussed above. 

1 5 The databases packaged with the kits are a compilation of expression patterns from 

human or laboratory animal genes and gene fragments (corresponding to the genes of Table 
3-9). Data is collected from a repository of both normal and diseased animal tissues and 
provides reproducible, quantitative results, z.e., the degree to which a gene is up-regulated or 
down-regulated under a given condition. 

20 The kits may used in the pharmaceutical industry, where the need for early drug 

testing is strong due to the high costs associated with drug development, but where 
bioinformatics, in particular gene expression informatics, is still lacking. These kits will 
reduce the costs, time and risks associated with traditional new drug screening using cell 
cultures and laboratory animals. The results of large-scale drug screening of pre-grouped 

25 patient populations, pharmacogenomics testing, can also be applied to select drugs with 

greater efficacy and fewer side-effects. The kits may also be used by smaller biotechnology 
companies and research institutes who do not have the facilities for performing such large- 
scale testing themselves. 

Databases and software designed for use with use with microarrays is discussed in 

30 Balaban et al, U.S. Patent Nos. 6,229,91 1, a computer-implemented method for managing 
information, stored as indexed tables, collected from small or large numbers of microarrays, 
and 6,185,561, a computer-based method with data mining capability for collecting gene 
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expression level data, adding additional attributes and reformatting the data to produce 
answers to various queries. Chee et al, U.S. Patent No. 5,974,164, disclose a software- 
based method for identifying mutations in a nucleic acid sequence based on differences i 
probe fluorescence intensities between wild type and mutant sequences that hybridize to 
reference sequences. 



Diagnostic Uses for the Liver Cancer Markers 

As described above, the genes and gene expression information provided in Tables 
3-9 may be used as diagnostic markers for the prediction or identification of the mahgnant 
state of the liver tissue. For instance, a Hver tissue sample or other sample from a patient 
may be assayed by any of the methods described above, and the expression levels from a 
gene or genes from the Tables, in particular the genes in Tables 3-5, may be compared to the 
expression levels found in normal liver tissue, tissue from metastatic liver cancer or 
hepatocellular carcinoma tissue. Expression profiles generated from the tissue or other 
sample that substantially resemble an expression profile from normal or diseased liver tissue 
may be used, for instance, to aid in disease diagnosis. Comparison of the expression data, 
as well as available sequence or other information may be done by researcher or 
diagnostician or may be done with the aid of a computer and databases as described above. 

Use of the Liver Cancer Markers for Monitoring Disease Progression 

As described above, the genes and gene expression information provided in Tables 
3-9 may also be used as markers for the monitoring of disease progression, for instance, the 
development of liver cancer. For instance, a hver tissue sample or other sample from a 
patient may be assayed by any of the methods described above, and the expression levels in 
the sample from a gene or genes from or 3-9 may be compared to the expression levels 
found in normal liver tissue, tissue from metastatic liver cancer or hepatocellular carcinoma 
tissue. Comparison of the expression data, as well as available sequence or other 
information may be done by researcher or diagnostician or may be done with the aid of a 
computer and databases as described above. 

Use of the Liver Cancer Markers for Drug Screening 
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According to the present invention, the genes identified in Tables 3-9 may be used as 
markers to evaluate the effects of a candidate drug or agent on a cell, particularly a cell 
undergoing malignant transformation, for instance, a liver cancer cell or tissue sample. A 
candidate drug or agent can be screened for the ability to simulate the transcription or 
expression of a given marker or markers (drug targets) or to down-regulate or counteract the 
transcription or expression of a marker or markers. According to the present invention, one 
can also compare the specificity of drugs' effects by looking at the number of markers 
which the drugs have and comparing them. More specific drugs will have fewer 
transcriptional targets. Similar sets of markers identified for two drugs indicates a similarity 
of effects. 

Assays to monitor the expression of a marker or markers as defined in Tables 3-9 
may utiUze any available means of monitoring for changes in the expression level of the 
nucleic acids of the invention. As used herein, an agent is said to modulate the expression 
of a nucleic acid of the invention if it is capable of up- or down-regulating expression of the 

nucleic acid in a cell. 

In one assay format, gene chips containing probes to at least two genes from Tables 
3-9 may be used to directly monitor or detect changes in gene expression in the treated or 
exposed cell as described in more detail above. In another format, cell lines that contam 
reporter gene fusions between the open reading frame and/or the 3' or 5' regulatory regions 
of a gene in Tables 3-9 and any assayable fiision partner may be prepared. Numerous 
assayable ftision partners are known and readily available including the firefly luciferase 
gene and the gene encoding chloramphenicol acetyltransferase (Alam et al, (1990) Anal. 
Biochem. 188, 245-254), Cell lines containing the reporter gene ftisions are then exposed to 
the agent to be tested under appropriate conditions and time. Differential expression of the 
reporter gene between samples exposed to the agent and control samples identifies agents 
which modulate the expression of the nucleic acid. 

Additional assay formats may be used to monitor the ability of the agent to modulate 
the expression of a gene identified in Tables 3-9. For instance, as described above, mRNA 
expression may be monitored directly by hybridization of probes to the nucleic acids of the 
invention. Cell lines are exposed to the agent to be tested under appropriate conditions and 
time and total RNA or mRNA is isolated by standard procedures such those disclosed in 
Sambrook et al, (1989) Molecular Cloning - A Laboratory Manual, Cold Spring Harbor 
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Laboratory Press). 

In another assay format, cells or cell lines are first identified which express the gene 
products of the invention physiologically. Cell and/or cell lines so identified would be 
expected to comprise the necessary cellular machinery such that the fidelity of modulation 
of the transcriptional apparatus is maintained with regard to exogenous contact of agent 
with appropriate surface transduction mechanisms and/or the cytosoUc cascades. Such cell 
hnes may be, but are not required to be, derived from liver tissue. Further, such cells or cell 
lines may be transduced or transfected with an expression vehicle (e.g,, a plasmid or viral 
vector) construct comprising an operable non-translated 5 '-promoter containing end of the 
structural gene encoding the instant gene products fiised to one or more antigenic fragments, 
which are pecuhar to the instant gene products, wherein said fragments are under the 
transcriptional control of said promoter and are expressed as polypeptides whose molecular 
weight can be distinguished from the naturally occurring polypeptides or may further 
comprise an immunologically distinct tag. Such a process is well known in the art (see 
Sambrook et al, (1989) Molecular Cloning - A Laboratory Manual, Cold Spring Harbor 
Laboratory Press). 

Cells or cell lines transduced or transfected as outUned above are then contacted with 
agents under appropriate conditions; for example, the agent comprises a pharmaceutically 
acceptable excipient and is contacted with cells comprised in an aqueous physiological 
buffer such as phosphate buffered saline (PBS) at physiological pH, Eagles balanced sah 
solution (BSS) at physiological pH, PBS or BSS comprising serum or conditioned media 
comprising PBS or BSS and serum incubated at 3TC. Said conditions may be modulated 
as deemed necessary by one of skill in the art. Subsequent to contacting the cells with the 
agent, said cells will be disrupted and the polypeptides of the lysate are fractionated such 
that a polypeptide fraction is pooled and contacted with an antibody to be fiorther processed 
by immunological assay {e.g,, ELISA, immunoprecipitation or Western blot). The pool of 
proteins isolated from the "agent-contacted" sample will be compared with a control sample 
where only the excipient is contacted with the cells and an increase or decrease in the 
immunologically generated signal from the "agent-contacted" sample compared to the 
control will be used to distinguish the effectiveness of the agent. 

Another embodiment of the present invention provides methods for identifying 
agents that modulate the levels, concentration or at least one activity of a protein(s) encoded 
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by the genes in Tables 3-9. Such methods or assays may utiUze any means of monitoring or 
detecting the desired activity. 

In one format, the relative amounts of a protein of the invention between a cell 
population that has been exposed to the agent to be tested compared to an un-exposed 
5 control cell population may be assayed. In this format, probes such as specific antibodies 
are used to monitor the differential expression of the protein in the different cell 
populations. Cell lines or populations are exposed to the agent to be tested under 
appropriate conditions and time. Cellular lysates may be prepared from the exposed cell 
line or population and a control, unexposed cell line or population. The cellular lysates are 

1 0 then analyzed with the probe, such as a specific antibody. 

Agents that are assayed in the above methods can be randomly selected or rationally 
selected or designed. As used herein, an agent is said to be randomly selected when the 
agent is chosen randomly without considering the specific sequences involved in the 
association of the a protein of the invention alone or with its associated substrates, binding 

15 partners, etc. An example of randomly selected agents is the use a chemical library or a 
peptide combinatorial library, or a growth broth of an organism. 

As used herein, an agent is said to be rationally selected or designed when the agent 
is chosen on a nonrandom basis which takes into account the sequence of the target site 
and/or its conformation in connection with the agents action. Agents can be rationally 

20 selected or rationally designed by utiUzing the peptide sequences that make up these sites. 
For example, a rationally selected peptide agent can be a peptide whose amino acid 
sequence is identical to or a derivative of any functional consensus site. 

The agents of the present invention can be, as examples, peptides, small molecules, 
vitamin derivatives, as well as carbohydrates. Dominant negative proteins, DNA encoding 

25 these proteins, antibodies to these proteins, peptide fragments of these proteins or mimics of 
these proteins may be introduced into cells to affect function. "Mimic" as used herein refers 
to the modification of a region or several regions of a peptide molecule to provide a 
structure chemically different from the parent peptide but topographically and functionally 
similar to the parent peptide (see Grant, (1995) in Molecular Biology and Biotechnology 

30 Meyers (editor) VCH Publishers). A skilled artisan can readily recognize that there is no 
limit as to the structural nature of the agents of the present invention. 
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Without further description, it is believed that one of ordinary skill in the art can, 
using the preceding description and the following illustrative examples, make and utilize the 
compounds of the present invention and practice the claimed methods. The following 
working examples therefore, specifically point out the preferred embodiments of the present 
5 invention, and are not to be construed as limiting in any way the remainder of the 
disclosure. 

EXAMPLES 

Example 1 : Tissue Sample Acquisition and Preparation 

10 Figure 1 outlines the experimental protocol used. Liver tissue samples were excised 

and snap fi:*ozen in liquid nitrogen. The clinical data for each of the samples included in this 
study are outlined in Table 1 . The sample set was composed of eight samples of normal 
liver tissue (N1-N8), five samples of metastatic adenocarcinoma arising firom rectum 
(designated Ml and M3) and colon (M2, M4 and M5) tissues and six samples of primary 

1 5 hepatocellular carcinomas. Samples were named according to type of tissue: 

HCC=hepatocellular carcinoma, M=metastatic, N==normal. Table 1 include the TNM 
classification (the American Joint Committee on Cancer's system of classifying cancers) of 
the tissues used as samples where T refers to the extent of the primary tumor, N refers to the 
absence or presence and extent of regional lymph node metastasis, and M refers to the 

20 absence or presence of distant metastasis. Numbers following T, N, and M refer to the size 
of the primary tumor and the amount of vascular invasion, where 0-=no evidence of tumor, 
lymph node involvement or metastasis, 4=multiple tumors involved, and x=cannot be 
assessed. Histopathologic grade (Table 1) is a qualitative assessment of differentiation of a 
tumor, where Gl=most differentiated and G4^=undifferentiated. Clinical stage (Table 1) 

25 characterizes the anatomic extent of disease in the patient from whom the sample was taken, 
where I and II are early stages. III and IV are late stages. 

With minor modifications, the sample preparation protocol followed the Affymetrix 
GeneChip Expression Analysis Manual. Frozen tissue was first ground to powder using the 
Spex Certiprep 6800 Freezer Mill. Total RNA was then extracted using Trizol (Life 

30 Technologies). The total RNA yield for each sample (average tissue weight of 300 mg) was 
200-500 jag. Next, mRNA was isolated using the Oligotex mRNA Midi kit (Qiagen), Since 
the mRNA was eluted in a final volume of 400 iJ.1, an ethanol precipitation step was required 
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to bring the concentration to 1 [ig/[xl. Using 1-5 |xg of mRNA, double stranded cDNA was 
created using the Superscript Choice system (Gibco-BRL). First strand cDNA synthesis 
was primed with a T7-(dT24) oUgonucleotide. The cDNA was then phenol-chloroform 
extracted and ethanol precipitated to a final concentration of 1 ^g/\il. 

From 2 |ag of cDNA, cRNA was synthesized according to standard procedures. To 
biotin label the cRNA, nucleotides Bio-1 1-CTP and Bio-16-UTP (Enzo Diagnostics) were 
added to the reaction. After a 37°C incubation for six hours, the labeled cRNA was cleaned 
up accordmg to the Rneasy Mini kit protocol (Qiagen). The cRNA was then fragmented 
(5x fragmentation buffer: 200 mM Tris-Acetate (pH 8.1), 500 mM KOAc, 150 mM 
MgOAc) for thirty-five minutes at 94°C. 

55 fxg of fragmented cRNA was hybridized on the human Hu35k set and the 
HuGeneFL array for twenty-four hours at 60 rpm in a 45°C hybridization oven The chips 
were washed and stained with Streptavidin Phycoerythrin (SAPE) (Molecular Probes) in 
Affymetrix fluidics stations. To amphfy staining, SAPE solution was added twice with an 
anti-streptavidinbiotinylated antibody (Vector Laboratories) staining step in between. 
Hybridization to the probe arrays was detected by fluorometric scanning (Hewlett Packard 
Gene Array Scaimer). Following hybridization and scanning, the microarray images were 
analyzed for quality control, looking for major chip defects or abnormaUties in 
hybridization signal. After all chips passed QC, the data was analyzed using Affymetrix 
GeneChip software (v3.0), and Experimental Data Mining Tool (EDMT) software (vl.O). 

Example 2; Gene Expression Analysis 

All samples were prepared as described and hybridized onto the Affymetrix 
HuGeneFL array and the Human Hu35k set of arrays. Each chip contains 16-20 
oUgonucleotide probe pairs per gene or cDNA clone. These probe pairs include perfectly 
matched sets and mismatched sets, both of which are necessary for the calculation of the 
average difference. The average difference is a measure of the intensity difference for each 
probe pair, calculated by subtracting the intensity of the mismatch from the intensity of the 
perfect match. This takes into consideration variability in hybridization among probe pairs 
and other hybridization artifacts that could affect the fluorescence intensities. Using the 
average difference value that has been calculated, the GeneChip software then makes an 
absolute call for each gene or EST. 
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The absolute call of present, absent or marginal is used to generate a Gene Signature, 
a tool used to identify those genes that are commonly present or commonly absent in a 
given sample set, according to the absolute call. For each set of samples, a median average 
difference was figured using the average differences of each individual sample within the 
set. The median average difference must be greater than 150 to assure that the expression 
level is well above the background noise of the hybridization. For the purposes of this 
study, only the genes and ESTs with a median average difference greater than 150 have 
been further studied in detail 

The Gene Signature for one set of samples is compared to the Gene Signature of 
another set of samples to determine the Gene Signature Differential. This comparison 
identifies the genes that are consistently present in one set of samples and consistently 
absent in the second set of samples. 

The Gene Signature Curve is a graphic view of the number of genes consistently 
present in a given set of samples as the sample size increases, taking into account the genes 
commonly expressed among a particular set of samples, and discounting those genes whose 
expression is variable among those samples. The curve is also indicative of the number of 
samples necessary to generate an accurate Gene Signature. As the sample number 
increases, the number of genes common to the sample set decreases. The curve is generated 
using the positive Gene Signatures of the samples in question, determined by adding one 
sample at a time to the Gene Signature, beginning with the sample with the smallest number 
of present genes and adding samples in ascending order. The curve displays the sample size 
required for the most consistency and the least amount of expression variabiUty from 
sample to sample. The point where this curve begins to level off represents the minimum 
number of samples required for the Gene Signature, Graphed on the x-axis is the number of 
samples in the set, and on the y-axis is the number of genes in the positive Gene Signature. 

Example 3: Gene Expression Analysis of N ormal Liver Tissue 

The gene expression patterns and Gene Signature were individually determined for 
each sample set: eight samples with normal liver pathology, six samples whose pathology 
indicated the primary malignancy to be hepatocellular carcmoma, and five samples whose 
primary colorectal adenocarcinoma had metastasized to the liver. The Gene Signatures 
obtained for the sample set are shown in Figure 2 
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The Gene Signature considers the present and absent genes alone, and does not take 
into consideration those that have been called marginal. Table 2 shows the numbers of 
present genes, called the positive Gene Signature, and the number of absent genes, called 
the negative Gene Signature, for each of the three sets of samples. 

The Gene Signature is the set of genes that are commonly present or commonly 
absent in N-1 samples of a given sample set. The positive Gene Signature for the normal 
hver tissues contains 6,213 genes and ESTs. This same set of normal samples did not show 
any detectable level of expression of 24,900 genes. Many of the genes and ESTs in this 
positive Gene Signature are housekeeping genes or structural genes that are not only 
expressed in the Uver, but are ubiquitously expressed in tissues throughout the body. 
Within this positive Gene Signature are also those genes whose expression is specifically 
restricted to normal hver tissue and those genes required for the liver to function at its 
normal capacities. It is the group of genes unique to the liver whose expression levels are 
most likely to change during tumorigenesis. Whether up-regulated or down-regulated or 
turned completely on or turned completely off, the changes in expression of these vital 
genes very likely contributes to the drastic changes in hver function caused by the 
transformation of normal liver cells into cancerous cells. 

Example 4: Gene Expression Analysis of Malignan t Liver Tissue 

There are 8,479 genes and ESTs in the positive Gene Signature for the HCC tumors, 
and a total of 23,233 genes and ESTs are included in the negative Gene Signature of the 
HCC samples. This negative Gene Signature includes all the genes that have been 
completely turned off during tumorigenesis, as well as those genes that are not usually 
expressed in liver tissue. These results include a number of genes and ESTs that are not 
regularly expressed in liver tissues, but through the process of tumor production, their 
expression patterns have been dramatically altered from no detectable level of expression to 
some significant level of expression in comparison with the normal liver. 

The colorectal metastases in the liver commonly express 5,102 genes and ESTs, and 
do not show expression of 30,455 additional genes and ESTs. As with the negative Gene 
Signature for the HCC sample set, the genes included in this data set are generally not 
expressed in hver tissue, whether tumor or normal tissue. The 5,102 in the sample set of 
metastatic tumors also identify those genes with expression levels that have been changed 
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from off to on as a result of tumor formation. 



Example 5: Analysis of Gene Expression Profiles 

A differential comparison of the genes and ESTs expressed in the normals and the 
5 two different types of liver tumors identifies a subset of the genes included in the positive 
Gene Signatures that are imiquely expressed in each sample set. This Gene Signature 
Differential highlights genes whose expression profiles have most dramatically changed in 
the transformation from normal to diseased liver cells. The parameters for these analyses 
were set to accommodate variation in expression of one eight normal samples and one of the 

10 six HCC samples or one of the 5 metastatic tumor samples, such that the genes categorized 
as unique to normal were called present by the software in seven of eight (87%) normal 
liver samples and were also called absent in five of six HCC (83%) or four of five (80%) 
metastatic liver tumor. Conversely, the genes categorized as unique to each set of tumors as 
compared to the normal livers were called present in five of six HCC (83%) or four of five 

1 5 (80%) metastatic tumor samples and absent in seven of eight normal livers (87%). 

The Gene Signature Differential comparing the normal livers to those with 
metastatic tumors identified a total of 903 sequences expressed only in normal liver tissue. 
The number of genes or ESTs that meet the median average difference minimum of 150 is 
449, of which 289 are genes and the number of ESTs is 160, The remaining ESTs and 

20 genes may be indistinguishable fi*om the background noise of the hybridization. The same 
comparison of normals versus metastatic tumors demonstrates that in the metastatic tumor 
samples there are 296 uniquely expressed sequences. Those that meet the median average 
difference minimum requirement are 83 genes and 72 ESTs. Those genes and ESTs 
expressed in metastatic and not in normal liver tissue are shown in Table 9A and those 

25 present in normal liver tissue and not metastatic tissue Table 9B. Numerous genes with 
differing expression levels in metastatic liver tumor tissue compared to normal tissue were 
identified. The fifteen genes whose expression level was most different in metastatic as 
compared to normal tissue are shown in Table 4. Those with the most increased expression 
are in Table 4 A and those with the most decreased expression are in Table 4B. Expression 

30 levels were determined by comparing the mean expression values of individual genes in 

tumor and normal liver samples. Fold change was calculated as a ratio with a p value given 
as a measure of statistical significance. Fold change is considered significant for a given 
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gene or EST when it is greater than 3.0 with a p value <0.05. Only the characterized genes 
have been listed; the ESTs with similar fold changes are not presented here. Asterisk (*) in 
Table 4 denotes those genes that were also identified in the Gene Signature differential 
between metastatic liver carcinoma and normal liver tissue. A complete listing of all the 
5 genes and ESTs with at least a three-fold change in expression is shown in Table 6. Table 
6A contains those genes and ESTs whose expression level increased in metastatic tissue 
relative to normal tissue and Table 6B contains those genes and ESTs whose expression 
level decreased. 

The Gene Signature Differential between the normal liver samples and the HCC 

10 samples identifies a total of 47 unique expressers in the normals, 23 with an median average 
difference of 150 ,13 of which are named gene and 10 of which are ESTs. When comparing 
the expression of the HCC samples with the normal livers, there are 243 genes and ESTs 
only expressed in the HCC samples. 

Those genes and ESTs expressed in HCC and not in normal liver tissue are shown in 

15 Table 8 A and those present in normal liver tissue and not HCC are shown in Table 8B. 
Numerous genes with differing expression levels in HCC compared to normal tissue were 
identified. The fifteen genes whose expression level was most different in HCC as 
compared to normal tissue are shown in Table 3. Those with the most increased expression 
are in Table 3 A and those with the most decreased expression are in Table 3B. Expression 

20 levels were determined by comparing the mean expression values of individual genes in 

tumor and normal liver samples. Fold change was calculated as a ratio with a p value given 
as a measure of statistical significance. Fold change is considered significant for a given 
gene or EST when it is greater than 3,0 with a p value <0.05. Only the characterized genes 
have been listed; the ESTs with similar fold changes are not presented here. Asterisk (*) in 

25 Table 3 denotes those genes that were also identified in the Gene Signature differential 
between hepatocellular carcinoma and normal liver tissue. A complete listing of all the 
genes and ESTs with at least a three- fold change in expression is shovra in Table 7. Table 
7 A contains those genes and ESTs whose expression level increased in hepatocellular 
carcinoma tissue relative to normal tissue and Table 7B contains those genes and ESTs 

30 whose expression level decreased. 

Analysis of sample set identified 24 ESTs and 42 genes that are expressed in both 
metastatic liver tumors and hepatocellular carcinomas, but not in normal liver tissues. The 
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fifteen genes with the most increase in expression level in both types of cancer are shown in 
Table 5. Expression levels were determined by comparing the mean expression values of 
individual genes in tumor and normal liver samples. The mean expression value for HCC 
and metastatic carcinomas was greater than 250, and included only those genes that showed 
5 a fold change greater than 3 with significant p values for both sets of tumors. No detectable 
level of expression was found in the normal liver tissues for these genes. Only the 
characterized genes have been listed; the ESTs with similar fold changes that are unique to 
the tumors are not presented here. 

Differential gene expression patterns between normal liver samples and 
10 hepatocellular carcinomas and between normal livers and metastatic liver tumors were 

examined. Genes uniquely expressed by each of the groups individually were identified, as 
well as those genes that are commonly expressed among liver tumors, whether primary 
hepatocellular carcinomas or metastatic liver tumors, 

15 Example 6: Association of Liver Cancer with Specific Gene Expression 

The present inventors have closely examined a number of the tumor-expressing 
genes to determine if their expression patterns correlate with previous reports pubhshed in 
the literature, and to define a logical relationship between the gene and 
hepatocarcinogenesis. A number of genes that have previously been associated with either 

20 liver cancer or other types of cancers were identified, as well as numerous genes that have 
not been linked to cancers in any previous studies. 

842 genes and ESTs that are up-regulated in hepatocellular carcinomas were 
identified when compared with normal hver tissue. One such gene is PTTGl, pituitary 
tumor-transforming gene 1, or securin, an oncogene that inhibits sister chromatid separation 

25 during anaphase. Normal tissues show little or no PTTGl expression, but high levels of 

expression have been associated with various tumors, including liver tumors, and carcinoma 
cell lines. Overexpression in NIH3T3 cells resulted in transformation, and these cells 
caused the formation of timiors when injected into mice. The mechanism by which this 
tumorigenic activity takes place is postulated to be through the missegregation of sister 

30 chromatids, resulting in aneuploidy and, therefore, genetic instability. Our data farther 
support this overexpression of PTTGl in hepatocellular carcinoma, with a fold change of 
10.7 (P-0.00052), and no detectable level of expression in normal tissues, as identified by 

l-WA/1623817.2 



Atty Docket: 4492 1 -5028-US 

-31- 

the differential comparison of the consensus patterns of gene expression of these two sample 
sets. 

Galectin 3, LGALS3, one of a family of beta-galactoside-binding animal lectins, is 
significantly overexpressed both in primary hepatocellular carcinoma and metastatic liver 
5 carcinomas with fold changes of 6.8 (P=0.00103) and 27.1 (P=0.00001), respectively. 

Expression of LGALS3 has been associated with tumor growth, progression, and metastasis, 
as well as cell-cell and cell-matrix interactions and inflammatory processes. Although 
expression studies have revealed no detectable level of galectin-3 in normal liver cells, 
samples from patients with hepatocellular carcinoma revealed considerable levels of 

1 0 LGALS3 expression. The abnormal expression of this lectin may be an early event in the 
process of transformation of normal cells to tumor cells, or it may impart an increased 
capacity for these tumor cells to survive and proUferate. Consistent with the reports in the 
art, an increased expression level was found in both types of tumor, but higher 
concentrations of galectin-3 were observed in liver metastates from colorectal tumors than 

15 in the primary HCC tumors . 

Another gene that is overexpressed in both hepatocellular carcinoma and metastatic 
colorectal adenocarcinomas with fold changes of 12.2 (P=0.00169) and 58.0 (P=0.00063), 
respectively, is solute carrier family 2, member 3, or glucose transporter 3 (GLUT3). It is 
one of a family of transmembrane proteins that function as facilitative glucose transporters, 

20 which has a unique specificity for brain and neuronal tissues. Glucose uptake and 
metabolism are known to be increased in carcinoma cells compared to normal cells. 
Glucose transporter expression may be elevated in response to the increase in glucose 
utilization seen in actively proliferating cells, like those of tumors. Conversely, the high 
levels of glucose transporter expression may be responsible for the enhanced influx of 

25 glucose into the tumor cells. Various reports have indicated increased expression of one or 
more of the family of glucose transporters in malignancies, including those of the brain, 
esophagus, colon, pancreas, liver, breast, lung, bladder, ovary, testis, skin, head and neck, 
kidney, and gastric tumors. It has been reported that metastatic liver carcinomas have even 
higher levels of GLUT3 expression than primary tumors. Consistent with previous studies, 

30 the current data confirm the significant overexpression of GLUT3 both in primary liver 

cancer, hepatocellular carcinoma, and in tumors that have metastasized from the colon and 
rectum. 



l-WA/l 6238 17.2 



Atty Docket: 44921-5028-US 

-32- 

One of the significantly underexpressed genes identified by comparing the 
expression profiles of hepatocellular carcinomas and metastatic liver tumors with that of 
normal liver tissue is metallothionein IL. The expression level in HCC is 26.9 fold lower 
than that of normal (P=0.00999), and in metastatic colorectal adenocarcinomas it is down- 
5 regulated 66.5 fold (P=0.00415). Metallothioneins are heavy metal binding proteins that 
are involved in detoxification of metals, zinc and copper metabolism cellular adaptation 
mechanisms, and may be involved in regulating apoptosis. Colorectal adenocarcinoma that 
has metastasized to the liver has been specifically reported to express less metallothionein 
than normal liver tissue. Comparison of the consensus patterns of gene expression between 

10 metastatic liver samples and normal liver samples show no significant level of MTIL 
expression in the tumors. Furthermore, additional work has determined that human 
hepatocellular carcinomas contain much lower levels of metallothioneins than normal liver 
tissue, and that this decrease correlates with the degree of differentiation and concentrations 
of copper and zinc in the cells. By comparing the expression profiles of hepatocellular 

15 carcinoma and normal liver tissue, this significant reduction in MTIL expression in HCC 
was confirmed. 

A number of enzymes belonging to the family of cytochrome P450s are drastically 
underexpressed in the two sets of liver tumors in comparison with the normal liver tissue. 
For example, expression of CYP2A6 is decreased in HCC with a fold change of 14.2 

20 (P=0.0307), and in metastatic tumors with a fold change of 69.9 (P=0). CYP8B 1 is down- 
regulated 19.3 fold (P=0.00807) in HCC and 65.1 fold (P=0.0039) in liver metastases. In 
addition to these commonly down-regulated cytochrome P450s, in HCC samples CYP2B is 
underexpressed 17,9 fold (P=0.01469), and in the metastatic liver tumors CYP2C9 and 
CYP2A7 are underexpressed 84.7 fold (P-0.00327) and 72.0 fold (P=0), respectively, 

25 Several of these genes are also identified by the differential comparison between expression 
profiles of tvimor and normal, confirming the significant decrease in expression in timior 
tissues. Many of these P450 enzymes are critical players in the metabolism of carcinogens, 
drugs, and other chemical compounds, that are expressed in normal liver. 

In addition to genes that are underexpressed in metastatic adenocarcinomas in the 

30 liver, more than 1000 genes and ESTs that are overexpressed specifically in these tumors 
were identified. Two of the most highly up-regulated are claudin 4, also known as 
clostridiimi perfiingens enterotoxin receptor 1 (fold change 84.4, P=0) and occludin (fold 
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change 43.1, P=0). Both of these genes are tight junction proteins, responsible for the 
formation and maintenance of continuous seals around epithelial cells to form a physical 
barrier that blocks the free passage of water and solutes through the paracellular space. 
More specifically J claudin-4 is one member of a family of transmembrane proteins that 
5 comprise tight junction strands, and occludin is a cell adhesion molecule. Claudins likely 
function as paracellular channels, regulating the flow of ions and solutes into and out of the 
paracellular space. Tight junction proteins also contribute to the regulation of the cellular 
processes of cell growth and differentiation. Permeability of tight junctions has been 
associated with tumor formation, where a breakdown in the barrier ftmction of tight 

1 0 junctions allows an increase in the cellular permeabiUty. This breakdown then opens the 
tight junction barrier, permitting invasion by tumor cells. It has been reported that tight 
junctions of colon tumors leak more than do the tight junctions of normal colon. A 
complete loss of tight juncton function and a loss of cell-cell contact growth control has 
been seen in cells that had been transfected with oncogenic Raf-1, and expression levels of 

1 5 occludin and another claudin ^e lower in these cells. Occludin expression has been up- 
regulated in vitro by the addition of various fatty acids that have anti-cancer effects, 
decreasing the paracellular permeability. The extreme down-regulation of occludin and 
cIaudin-4 in metastatic liver tumors is strongly supported by the reports of tight junction 
breakdown in tmnor tissues. 

20 The present study identified 93 significantly up-regulated genes in both primary 

HCC and metastatic liver timiors that were not found to have any detectable level of 
expression in the normal samples. Serine protease inhibitor, Kazal type I (SPINKl), also 
called pancreatic secretory trypsin inhibitor (PSTI) or tumor-associated trypsin inhibitor 
(TATI), is one such gene. It is highly expressed in the cells of normal pancreas and in the 

25 mucosa of the gastrointestinal tract where it offers protection fi-om proteolytic breakdown, 
A marked increase in expression is seen in various pancreatic diseases and in tumors of 
different tissues, including gastric carcinomas, colorectal cancers, and other neoplastic 
tissues. This increase is presumably due to the elevated expression of trypsin in the tumors, 
and not related to amplification or rearrangements within the gene. SPINKl is also 

30 considered a valuable marker for a number of solid tumors. An elevation of SPINKl in the 
blood of patients with hepatocellular carcinoma has been seen. Furthermore, they suggest 
that the level of expression correlates with the extent of tumor^ such that this heightened 
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expression level could be indicative of HCG under certain conditions. In keeping with this 
report of overexpression in these tumors, the present expression data show the levels of 
expression of this gene in HCC samples to be 28.9 times higher than normal (P-0.00003), 
and in metastatic liver tumors the expression level is 9.8 times higher than normal 
5 (P=0.03697). 

Midkine is one of a family of heparin-binding growth factors, inducible by retinoic 
acid, and is actively involved in cell-cell interactions and angiogenesis. The expression 
pattem of midkine is highly restricted in normal adult tissues, and no expression has been 
reported in normal adult liver, although its expression is required during embryogenesis for 

1 0 normal development. However, it is expressed in moderate to high levels in many tumors, 
including Wilm's tumors of the kidney, stomach, colon, pancreas, lung, esophagus, breast, 
and liver tumors. The present data confirm these reports, showing a significant 
overexpression of midkine in hepatocellular carcinoma samples (fold change 9.9, 
P=0.02104) and in liver metastases (fold change 10.4, P=0.01818), but no noticeable 

1 5 expression in normal liver. 

Stathmin, leukemia-associated phosphoprotein 18, is a phosphoprotein whose 
expression pattem and phosphorylation status are controlled by extracellular signals 
responsible for the regulation of the processes of cell prohferation and differentiation. It is 
also involved in the regulation of cell division via the destabihzation of microtubules. 

20 When comparing expression levels between non-mahgnant tissues and maUgnant tissues, 
the tumors generally show a significant up-regulation of this phosphoprotein, specifically 
lymphomas, leukemias, breast and prostate tumors. One reason proposed for this elevated 
expression in cancer cells is the dissimilarity in the rates of cell prohferation and states of 
differentiation between normal and tumor cells. In both HCC samples and metastatic 

25 adenocarcinomas, significant up-regulation of stathmin, 9.4 fold in HCC (P=0.00015) and 
4.8 fold in metastatic tumors (P=0.00514) was seen. 

Both the genes and ESTs described here will provide valuable information for the 
identification of new drug targets against liver carcinomas, and that information may be 
extended for use in the study of carcinogenesis in other tissues. These sequences may be 

30 used in the methods of the invention or may be used to produce the probes and arrays of the 
invention, 
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Although the present invention has been described in detail with reference to 
examples above, it is understood that various modifications can be made without departing 
from the spirit of the invention. Accordingly, the invention is limited only by the following 
claims. All cited patents, apphcations and publications referred to in this application are 
herein incorporated by reference in their entirety. 
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